Skip to content

ML-As-4

Problem 1: Neural Networks (20 pts)

Consider a 3-layer fully connected neural network with the following architecture:

  • Input layer: n = 4 neurons
  • Hidden layer: m = 3 neurons using a custom activation function f(x)=ReLU(x)+sin(x)
  • Output layer: k = 2 neurons using a softmax activation function σ(zi)=eizjezj

The network parameters (weights and biases) are given as:

  • W1R3×4 and b1R3 for the hidden layer.
  • W2R2×3 and b2R2 for the output layer.

Given the input vector xR4 and target output yR2 . Define the loss function as cross-entropy loss:

Loss=i=1kyilog(y^i)

where y^ is the output after the softmax activation.

Q1

1. Derive the equations for the forward pass through the network, including both the hidden and output layers. (3 pts)

Forward pass through the network:

  • Hidden layer:
z(1)=W1x+b1h=f(z(1))=ReLU(z(1))+sin(z(1))
  • Output layer:
z(2)=W2h+b2y^=σ(z(2))=ez(2)jezj(2)
  • Loss:
y^i=ezi(2)j=12ezj(2)

2. Calculate the outputs Z₁, H, Z₂, and ŷ explicitly for a given input x=[1,1,0.5,2]T and the following initial weights and biases:

W1=(0.10.20.30.40.50.30.10.20.40.20.50.3),b1=(0.10.10.05)W2=(0.30.20.10.40.50.3),b2=(0.050.05)
  • Note that Z1 is the net input to the hidden layer, H is the activation output of the hidden layer, and Z2 is the net input to the output layer. (3 pts)
Z1=W1x+b1=(0.10.20.30.40.50.30.10.20.40.20.50.3)(110.52)+(0.10.10.05)=(1.350.350.6)H=ReLU(Z1)+sin(Z1)=(2.3250.6921.164)Z2=W2H+b2=(0.30.20.10.40.50.3)(2.330.691.64)+(0.050.05)=(0.39270.8832)y^=σ(Z2)=(0.21810.7819)

Q2

Derive the gradient of the loss with respect to each parameter (W1,b1,W2,b2) in the network and obtain the gradient values using results from the first question. Use matrix calculus to express the gradients. Hint: You can first calculate the error terms δ2 and δ1 for each layer and use them to express the gradients. (10 pts)

Error term δ2

δ2=y^y

Gradient of the loss with respect to W2

LossW2=Lossy^y^Z2Z2W2=δ2HT

Gradient of the loss with respect to b2

Lossb2=Lossy^y^Z2Z2b2=δ2

Error term δ1

δ1=LossHHZ1=W2Tδ2HZ1

Gradient of the loss with respect to W1

LossW1=LossHHZ1Z1W1=δ1xT

Gradient of the loss with respect to b1

Lossb1=LossHHZ1Z1b1=δ1

Q3

Suppose the learning rate α=0.001. Please calculate the updated parameter values after one back propagation process. (4 pts)

The general update rule for each parameter θ with gradient descent is:

θ=θαLossθ

where α is the learning rate.

Updating W2:

W2new=W2αLossW2=W20.001δ2HT

Updating b2:

b2new=b2αLossb2=b20.001δ2

Updating W1:

W1new=W1αLossW1=W10.001δ1xT

Updating b1:

b1new=b1αLossb1=b10.001δ1